SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs

نویسندگان

  • Phasit Charoenkwan
  • Watshara Shoombuatong
  • Hua-Chin Lee
  • Jeerayut Chaijaruwanich
  • Hui-Ling Huang
  • Shinn-Ying Ho
چکیده

Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction method with informative sequence features to provide insights into protein crystallization. This study proposes an ensemble method, SCMCRYS, to predict protein crystallization, for which each classifier is built by using a scoring card method (SCM) with estimating propensity scores of p-collocated amino acid (AA) pairs (p=0 for a dipeptide). The SCM classifier determines the crystallization of a sequence according to a weighted-sum score. The weights are the composition of the p-collocated AA pairs, and the propensity scores of these AA pairs are estimated using a statistic with optimization approach. SCMCRYS predicts the crystallization using a simple voting method from a number of SCM classifiers. The experimental results show that the single SCM classifier utilizing dipeptide composition with accuracy of 73.90% is comparable to the best previously-developed SVM-based classifier, SVM_POLY (74.6%), and our proposed SVM-based classifier utilizing the same dipeptide composition (77.55%). The SCMCRYS method with accuracy of 76.1% is comparable to the state-of-the-art ensemble methods PPCpred (76.8%) and RFCRYS (80.0%), which used the SVM and Random Forest classifiers, respectively. This study also investigates mutagenesis analysis based on SCM and the result reveals the hypothesis that the mutagenesis of surface residues Ala and Cys has large and small probabilities of enhancing protein crystallizability considering the estimated scores of crystallizability and solubility, melting point, molecular weight and conformational entropy of amino acids in a generalized condition. The propensity scores of amino acids and dipeptides for estimating the protein crystallizability can aid biologists in designing mutation of surface residues to enhance protein crystallizability. The source code of SCMCRYS is available at http://iclab.life.nctu.edu.tw/SCMCRYS/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Propensity based classification: Dehalogenase and non-dehalogenase enzymes

The present work was designed to classify and differentiate between the dehalogenase enzyme to non–dehalogenases (other hydrolases) by taking the amino acid propensity at the core, surface and both the parts. The data sets were made on an individual basis by selecting the 3D structures of protein available in the PDB (Protein Data Bank). The prediction of the core amino acid were predicted by I...

متن کامل

Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences

Bioluminescent proteins (BLPs) are a class of proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. While valuable for commercial and medical applications, identification of BLPs, including luciferases and fluorescent proteins (FPs), is rather challenging, owing to their high variety of protein sequences. Moreover, characterization ...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

A Crystallization-Induced Asymmetric Transformation using Racemic Phenyl Alanine Methyl Ester Derivatives as Versatile Precursors to Prepare Amino Acids

L-Tyrosine and L-Dopa are the precursors in the biological synthesis of amine neurotransmitters. On the other hand, phenylalanine as an aromatic amino acid (AAA) is a precursor in the synthesis of L-Tyrosine and L-Dopa. For some substrates such as amino acids, resolution by the formation of diastereomers offers an attractive alternative. Among different methods in this case, crystallization-ind...

متن کامل

Protein Sequence Alignment and Database Scanning

In the context of protein structure prediction, there are two principle reasons for comparing and aligning protein sequences: (a) To obtain an accurate alignment. This may be for protein modelling by comparison to proteins of known three-dimensional structure. (b) To scan a database with a newly determined protein sequence and identify possible functions for the protein by analogy with well-cha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013